Overview of the TREC 2011 Web Track
نویسندگان
چکیده
The TREC Web Track explores and evaluates Web retrieval technology over large collections of Web data. In its current incarnation, the Web Track has been active since TREC 2009, where it included both a traditional adhoc retrieval task and a new diversity task [4]. The goal of this diversity task is to return a ranked list of pages that together provide complete coverage for a query, while avoiding excessive redundancy in the result list. For TREC 2010 the track introduced a new Web spam task and Web-style, six-level relevance assessment for the adhoc task [5]. For TREC 2011, as recommended by participants at the track planning session held during TREC 2010, we dropped the spam task but continued the other tasks essentially unchanged. As we did for TREC 2009 and TREC 2010, we based our TREC 2011 experiments on the billion-page ClueWeb09 collection created by the Language Technologies Institute at Carnegie Mellon University. The tasks use a common topic set, differing only in their evaluation methodology. Topics are created from the logs of a commercial search engine, with the aid of tools developed at Microsoft Research [9]. Given a target query, these tools extract and analyze groups of related queries, using co-clicks and other information, to identify clusters of queries that highlight different aspects and interpretations of the target query. These clusters are employed by NIST for topic development. Each resulting topic is structured as a representative set of subtopics, each related to a different user need. The selection of subtopics attempts to reflect a mix of genuine user requirements for the topic. For the adhoc task documents are judged with respect to the topic as a whole. Relevance levels are similar in structure to the levels used in commercial Web search, including a spam/junk level. Moreover, the top two levels of the assessment structure are closely related to the homepage finding and topic distillation tasks appearing in older Web Tracks. For the diversity task, documents are judged with respect to the subtopics, as well as with respect to the topic as a whole. For TREC 2011, the topic selection process was modified slightly from previous years. For TREC 2009 and 2010, topics were chosen to be of medium-to-high frequency. TREC 2011 attempts to work with more obscure topics, which may still be underspecified (i.e., faceted) but may be less ambiguous. Search engines have difficulty with queries of this type, since they can rely less on click/anchor information, and popularity signals like PageRank. With these new tough topics we hope to work in an area of Web retrieval that has received relatively little attention. Given the smaller number of pages that may be relevant for these tough topics, we may potentially be able to create a more reusable collection, with sufficiently complete judgments.
منابع مشابه
Overview of the TREC 2013 Crowdsourcing Track
In 2013, the Crowdsourcing track partnered with the TREC Web Track and had a single task to crowdsource relevance judgments for a set of Web pages and search topics shared by the Web Track. This track overview describes the track and provides analysis of the track’s results.
متن کاملOverview of the TREC 2013 Federated Web Search Track ( draft )
The goal of the TREC Federated Web Search track is to promote research related to federated search, in a realistic web setting. This overview paper discusses the main results of the FedWeb 2013 track. In this first year of the track, we focused on basic challenges in federated search: (1) resource selection, and (2) results merging. After an overview of the provided data collection and the rele...
متن کاملOverview of the TREC 2010 Legal Track Notebook Draft 2010 . 10 . 25
The TREC 2010 Legal Track consisted of two distinct tasks: the learning task, in which participants were required to estimate the probability of relevance for each document, and the interactive task, in which participants were required to identify all relevant documents using a human-in-the-loop process. 2010 is the fth year of the legal track, the third year of the interactive task within the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011